Arabic Text Classification Using Maximum Entropy
نویسنده
چکیده
In organizations, a large amount of information exists in text documents. Therefore, it is important to use text mining to discover knowledge from these unstructured data. Automatic text classification considered as one of important applications in text mining. It is the process of assigning a text document to one or more predefined categories based on their content. This paper focus on classifying Arabic text documents. Arabic language is highly inflectional and derivational language which makes text mining a complex task. In our approach, we first preprocessed data using natural language processing techniques such as tokenizing, stemming and partof-speech. Then, we used maximum entropy method to classify Arabic documents. We experimented our approach using real data, then we compared the results with other existing systems. صخلم : يصن لكشب ةدوجوم تامولعملا نم ريثكلا كانه تاسسؤملا يف . ةرـيبك ةيمهأ كانه كلذل عونلا اذه نم تانايبلا نع بيقنتلل . بيقنتلا لاجم يف ةمهملا تاقيبطتلا دحأ ربتعي يئاقلتلا فينصتلا تانايبلا نع يصنلا . ت ةيلمع وه يئاقلتلا فينصتلا ىلإ ةقيثولا فينص أ ةـفرعملا تافينـصتلا دح ءانب اقبسم ةقيثولا ىوتحم ىلع . ةغللاب ةبوتكملا تادنتسملل يئاقلتلا فينصتلا ىلع زكري ثحبلا اذه ةيبرعلا . يقنتلا ةبوعص ديزي امم ةيبرعلا ةغللا يف فيرصتلاو ليكشتلا نم ريثكلا كانه ب يـصنلا تادنتسملا يف تانايبلا نع . ا اذه يف جلاعملا ىلع لاوأ لمعن ثحبل ة ادختـساب تاـنايبلل ةقبسملا م زـيمرتلا لثم رشبلا تاغل ةجلاعم تاينقت ) tokenizing ( رـيزجتلاو ) Stemming ( فيرـصتلاو ) Part-of-Speech . ( رتنلاا ةينقت مدختسا مث و ىوصقلا يب ) maximum entropy ( فينـصتل تادنتسملا . مت كلذكو إ هذه ىلع براجت ءارج جئاتنلا ةنراقم متو ةقيقح تادنتسم مادختساب ةقيرطلا ةيبرعلا تادنتسملا فينصت يف ىرخأ ةمظنأ عم .
منابع مشابه
Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کاملArabic Opinion Mining Using Combined Classification Approach
In this paper, we present a combined approach that automatically extracts opinions from Arabic documents. Most research efforts in the area of opinion mining deal with English texts and little work with Arabic text. Unlike English, from our experiments, we found that using only one method on Arabic opinioned documents produce a poor performance. So, we used a combined approach that consists of ...
متن کاملSummary of Text Categorization based on Maximum Entropy Model
Since 1990s, the maximum entropy model has been used in text categorization and achieves good results in Natural Language Processing since its framework and algorithm were established. On the basis of the Maximum Entropy Model, scholars improve it and make a more in-depth study. Using Maximum Entropy Model for text sentiment categorization has become a hot research topic in recent years. In thi...
متن کاملA Comparative Study on Arabic Text Classification
This paper focuses on Automatic Arabic classifications. Arabic language is highly inflectional and derivational language which makes text mining a complex task. In classifying Arabic text, there are many published experimental results. Since these results came from different datasets, authors and evaluation metrics, we cannot compare the performance of the experimented classifiers. In this pape...
متن کاملClassifying and Segmenting Classical and Modern Standard Arabic using Minimum Cross-Entropy
Text classification is the process of assigning a text or a document to various predefined classes or categories to reflect their contents. With the rapid growth of Arabic text on the Web, studies that address the problems of classification and segmentation of the Arabic language are limited compared to other languages, most of which implement word-based and feature extraction algorithms. This ...
متن کامل